If you’re interested … tomelliott.co.nz/phd
MBIE Endeavour grant
Colin Simpson (VUW), Barry Milne (COMPASS), Andrew Sporle
Informatics for Social Services and Wellbeing …
more later!
Honorary position here (thanks James)
my side-project since 2013/14
shifting focus as audience has evolved
See Chris Wild’s talks featuring hits like We Will Plot You
for organisations/groups with low/no money/time/both
recent focus on surveys — now handled natively!
key goal is removal of barriers
Data
GUI
Explore
Export results/code
data is from a survey?In
iNZight isn’t much better … or is it?!
(Remember survey variables never have nice names)
data = "mysurvey.csv"
weights = "wt0"
repweights = "^w[0-4]"
reptype = "JK1"
iNZight (GUI interface, collects user input, displays results)
iNZightModules (UI for time series, regression, maps, …)
iNZightPlots (graphs, summaries, inference)
iNZightTools (utility functions, data wrangling)
iNZightTS (time series)
iNZightMR (multiple response)
iNZightRegression (model summaries, residual plots)
iNZightMaps (lat/lng points, fill-in-the-shapefile maps)
plus vit and some others …
wrapper functions makes programming GUIs easier
packages don’t need GUI
iNZightPlots::inzplot()
simple functions aimed towards novice coders
returns the R code
GUI \(\rightarrow\) high level functions \(\rightarrow\) lower-level (e.g., ggplot)
library(iNZightTools)
iris_filtered <- filterNumeric(iris, "Sepal.Width", "<", 3.5)
head(iris_filtered)## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 4.9 3.0 1.4 0.2 setosa
## 2 4.7 3.2 1.3 0.2 setosa
## 3 4.6 3.1 1.5 0.2 setosa
## 4 4.6 3.4 1.4 0.3 setosa
## 5 5.0 3.4 1.5 0.2 setosa
## 6 4.4 2.9 1.4 0.2 setosa
## [1] "iris %>% dplyr::filter(Sepal.Width < 3.5)"
## # A tibble: 3 x 3
## Species Sepal.Length_median Sepal.Length_var
## <fct> <dbl> <dbl>
## 1 setosa 5 0.124
## 2 versicolor 5.9 0.266
## 3 virginica 6.5 0.404
## [1] "iris %>% dplyr::group_by(Species) %>% dplyr::summarize(Sepal.Length_median = median(Sepal.Length, "
## [2] " na.rm = TRUE), Sepal.Length_var = var(Sepal.Length, na.rm = TRUE), "
## [3] " .groups = \"drop\")"
modified wrapper functions to handle surveys
refactored GUI to pass around a ‘data-thing’ (data or survey)
library(survey)
data(api, package = "survey")
dclus2 <- svydesign(id = ~dnum+snum,
fpc = ~fpc1+fpc2,
data = apiclus2
)
dclus2_filtered <- filterNumeric(dclus2, "api99", ">=", 700)
code(dclus2_filtered)## [1] "dclus2 %>% srvyr::as_survey() %>% srvyr::filter(api99 >= 700)"
Big thanks to the ‘srvyr’ package!
How does this all relate to my postdoc?
Rourou = basket
Nā tō rourou, nā taku rourou, ka ora ai te iwi.
(With your food basket and my food basket the people will thrive.)
Tātaritanga = analysis
Te Rourou Tātaritanga
“Tools for analytics and sharing data for the betterment of communities.”
Or: “Informatics for Social Services and Wellbeing”
Improve data standards
Promote Māori data sovereignty
Develop systems to support access
Evaluate synthesising of datasets
Security and privacy implications
Machine learning and AI methods
Improve data standards
Promote Māori data sovereignty
Develop systems to support access
Evaluate synthesising of datasets
Security and privacy implications
Machine learning and AI methods
database connecting data across NZs sectors
high security environment
but also other unnecessary barriers: coding!
high school and/or university
no coding necessary
easy to learn and relearn
iNZight in Stats NZ data lab …? Watch this space!
Start confined to (example) small data sets …
primary researcher: SQL \(\Rightarrow\) CSV
non-coding researchers: graphs, tables, …
… and build from there!
iwi groups, pacific nations, etc. with specific needs
population summaries (tables of counts)
regression models
demographic models …
easy to learn and relearn
repeat analyses after 6 months / 2 years
no (or low) (re)training or consultation costs
produces R code script
Why limit yourself to tables when you can fit hierarchical Bayesian models with model-specific priors, likelihoods, … ?
John Bryant’s R packages (dembase, demest, …) for Bayesian demography
R coding required (and data transformations, working with multi-dimensional arrays, …)
so we tested out iNZight’s new add-on system …
Both work and ‘fun’
to get access to the IDI, you need to put together a research proposal
putting together a research proposal requires knowing what data is available to investigate
that data is hidden away in the IDI
simple web app (ReactJS)
searchable database (schema, table name, variable name, descriptions where available)
prospective (and current) IDI researchers can explore what’s available
the display in 302 was broken
rebuilt it (again) using ReactJS + d3
uses newly available real-time occupancy
long-term goal: a prototype of iNZight built with ReactJS and R-serve
a single app for Windows / macOS / Linux / web
connecting to local/remote R server (user permissions, firewall, etc.)
Github: tmelliott | iNZightVIT | terourou
Twitter: @tomelliottnz | @iNZightUoA | @terourou
tomelliott.co.nz | inzight.nz | terourou.org